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Chimpanzee and gorilla chromosomes differ from human chromosomes by the presence of large blocics of subterminal 
heterochromatin thought to be composed primarily of arrays of tandem satellite sequence. We explore their sequence 
composition and organization and show a complex organization composed of specific sets of segmental duplications that 
have hyperexpanded in concert with the formation of subterminal satellites. These regions are highly copy number 
polymorphic between and within species, and copy number differences involving hundreds of copies can be accurately 
estimated by assaying read-depth of next-generation sequencing data sets. Phylogenetic and comparative genomic 
analyses suggest that the structures have arisen largely independently in the two lineages with the exception of a few seed 
sequences present in the common ancestor of humans and African apes. We propose a model where an ancestral human- 
chimpanzee pericentric inversion and the ancestral chromosome 2 fusion both predisposed and protected the chimpanzee 
and human genomes, respectively, to the formation of subtelomeric heterochromatin. Our findings highlight the complex 
interplay between duplicated sequences and chromosomal rearrangements that rapidly alter the cytogenetic landscape in 
a short period of evolutionary time. 

[Supplemental material is available for this article.] 



Chimpanzee and gorilla chromosomes differ cytogenetically from 
human chromosomes by 1 1 large-scale rearrangements (nine para- 
centric and pericentric inversions, one translocation, and one fu- 
sion) and by the presence of additional terminal G-bands adjacent 
to the telomere. These subtelomeric caps are heterochromatic in 
nature and are completely absent from the karyotype of human 
and orangutan (Haaf and Schmid 1987; IJdo et al. 1991; Ventura 
et al. 2011). Subterminal heterochromatin has been thought to 
be composed primarily of a tandem array of a 32-bp subterminal 
satellite (StSat) creating large subterminal blocks of constitutive 
heterochromatic regions (Royle et al. 1994; Koga et al. 2011) ad- 
jacent to the canonical telomeric TTAGGG sequence (Greider and 
Blackburn 1989). While almost all gorilla chromosomes show the 
presence of subterminal caps, only half of chimpanzee chromo- 
somes possess such structures (Fan et al. 2002). In both chimpan- 
zee and pygmy chimpanzee, this process has also created islands of 
interstitial heterochromatin on both chromosomes VII and XIII 
(Royle et al. 1994). 

Due to the high-copy repetitive nature of these regions, the 
subtelomeric heterochromatin, like centromeric and secondary 
constriction on acrocentric chromosomes, is not represented in 
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the existing genome assemblies of the chimpanzee (The Chim- 
panzee Sequencing and Analysis Consortium 2005) and gorilla. 
Unlike centromeric satellite sequences, which were well-charac- 
terized prior to the era of full genome sequencing (Rudd and 
Willard 2004), there are relatively few detailed molecular studies 
of the organization or evolution of the subterminal caps (Royle 
et al. 1994). Conservatively it has been estimated by dot-blot anal- 
ysis that the subterminal heterochromatin constitutes >3 Mbp 
(0. 1%) of the total genomic DNA of each species (Yunis and Prakash 
1982; Koga et al. 2011). 

Subtelomeric regions are more generally recognized as ex- 
tremely dynamic regions of chromosomes (Mefford and Trask 
2002; Prieto et al. 2004; Gonzalez-Garcia et al. 2006; Carreto et al. 
2008; Nieves et al. 2011). In humans, large complex blocks of 
duplicated sequences — zones of subtelomeric duplication — ^typi- 
cally define the last 50-150 kbp of human chromosomes (Mefford 
and Trask 2002; Riethman et al. 2005). It has been postulated that 
this genomic dynamism occurs during meiotic prophase when all 
the chromatids interconnect, allowing for nonhomologous chro- 
mosome exchange of chromosome ends (Wallace and Hulten 1985). 
This contact may explain why specific subtelomeric regions pref- 
erentially associate and share a high degree of sequence identity 
despite mapping to nonhomologous chromosomes (Hirai et al. 
2005). The high sequence identity among subtelomeric regions 
has been an important consideration in understanding changes in 
the chromatin environment related to some human genomic disor- 
ders (Gerber et al. 201 1; Kudo et al. 201 1; van der Maarel et al. 201 1). 
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Recently, we identified the presence of specific segmental 
duplications that have hyperexpanded within the subterminal 
cap of both gorilla and chimpanzee (Cheng et al. 2005; Marques- 
Bonet et al. 2009; Ventura et al. 2011). In this study we perform 
a detailed investigation into the organization and evolution of 
subterminal caps using molecular cytogenetics, targeted clone- 
based sequencing, and phylogenetic-based analyses. Based on the 
evolutionary history of hominid chromosomal rearrangements, 
we develop a model to explain the distribution and organization 
of subtelomeric caps in gorilla and chimpanzee, as well as their 
absence in humans. We hypothesize that ancestral chromosomal 
rearrangements were important in catalyzing the spread of these 
chromosome structures and that the fusion of human chromo- 
some 2 squelched this process in humans by eliminating the 
source of the StSat sequence shortly after human speciation. 

Results 

Subtelomeric cap distribution among gorilla 
and chimpanzee chromosomes 

In order to test the localization of telomeric TTAGGG sequences 
with respect to the 32-bp StSat, we began by performing bicolor 
fluorescent in situ hybridization (FISH) experiments using bio- 
tinylated oligonucleotides designed to the satellite sequence in 
conjunction with a commercial PNA (peptide nucleic acid) corre- 
sponding to telomeric TTAGGG sequence (Methods). As expected, 
the telomeric satellite colocalizes with "caps" in both species, 
mapping immediately proximal to the PNA telomere probe (Sup- 
plemental Fig. SI). We were able to identify all chromosome arms 
carrying StSat in chimpanzee (42/96 chromosome tips and inter- 
stitial signals on chromosomes VII and XIII) and in gorilla (80/96 
chromosome tips). Cap signals were detected on short arms of lip and 
Ilq chromosomes in chimpanzee and on both arms of Ilq in gorilla. 
No traces of signals were observed on the long arms of chimpanzee 
Hp and Ilq or on the short arm of gorilla Hp. For gorilla, in particular, 
we did not detect signals on any of the p arms of the acrocentric 
chromosomes (Hp, IX, XIII, XIV, XV, XVIII, XXI, and XXII), except for 
chromosome Ilq showing signals on the p and q arms. 

Satellite III (Satlll) DNA and ribosomal DNA (rDNA) have 
been reported to localize to the short arm of acrocentric chromo- 



somes in chimpanzee and gorilla (Jarmuz et al. 2007). In order 
to understand the chromosomal distribution of these high-copy 
repeats, we performed three-color FISH experiments in chimpan- 
zee and gorilla using StSat, rDNA, and Satlll (group 1 and group 2) 
(Bandyopadhyay et al. 2001b) specific probes. In gorilla, we found 
that all chromosome termini or arms hybridized strongly to one 
of these probes, but never did these probes colocalize to the same 
arm (Fig. 1). In chimpanzee, not all chromosomes showed hy- 
bridization signals for satellite sequences and, as in gorilla, StSat 
probes never colocalized with either rDNA or Satlll probes. Both 
rDNA and Satlll probes, however, did colocalize on chromosomes 
Xlllpter and XlVpter. These results and the reduced signal in- 
tensity suggest a less pronounced spreading of satellite sequences 
to the tips of chimpanzee chromosomes when compared to go- 
rilla (Fig. 1). In chimpanzee, we note that not all chromosomal 
ends are capped by StSat or another form of satellite. These chro- 
mosomal ends that stain G-negative are more similar to the ter- 
minal chromosomal structures found in human and orangutan. 

Molecular cytogenetic analysis of subterminal regions 

We previously identified two regions of segmental duplication that 
mapped to the subterminal caps and had hyperexpanded to 
hundreds of copies in both chimpanzee and gorilla. This included 
a 36-kbp chimpanzee-hyperexpanded segment homologous to 
human chromosome 2, contained in the chimpanzee clone 
AC 150905, and a 45-kbp gorilla-hyperexpanded segment homolo- 
gous to human chromosome 10 (19534885-19579478 NCBI35) 
(Fan et al. 2002; Cheng et al. 2005; Marques-Bonet et al. 2009; 
Ventura et al. 2011). To better define the cap structure organization 
in both apes and delineate the full extent of these duplications, we 
performed a series of comparative FISH experiments using a set of 
overlapping human fosmid clones (Kidd et al. 2010) mapping to 
a 463-kbp region of human chromosome 2 (1 13840263-114303469) 
and a 341-kbp region mapping to chromosome 10 (19359773- 
19700390) (Table 1; Fig. 2). Signals were distinguished as map- 
ping to the subterminal cap or immediately proximal to the het- 
erochromatic cap, termed "subcap" for the purpose of this study. 

Based on differences in the pattern of hybridization, we de- 
fined three main areas from human chromosome 2 (2B, C, and E) 
in chimpanzee and four regions from human chromosome 10 




Figure 1. Three-color FISH experiment using StSat, Satlll, and rDNA on chimpanzee and gorilla chromosomes. Satlll and rDNA never colocalize with 
the StSat in both species. (A) In chimpanzee, however, rDNA and Satlll probes strongly colocalize on chromosomes Xlllpter (white arrows) and XlVpter 
(yellow arrows). (S) In gorilla, all chromosomal tips are positive for one of the three forms of the satellite. 
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Figure 2. Duplication and fosmid probe map. The pattern of segmental duplications is shown for {A) a 463-kbp region of human chromosome 2 
(chr2:l 13840263-1 14303469) and (8) a 341 -kbp region mapping to chromosome 10 (chrl 0:1 9359773-1 9700390) based on human genome anno- 
tation (NCBI build35). Computationally predicted human, chimpanzee, and gorilla duplications (red = excess depth-of-coverage of aligned whole- 
genome shotgun sequence) as well as a heat map indicating the copy number are displayed. Fosmid clone contigs are reported be/oweach genomic 
region. Human fosmid probes underlying each region are shown and grouped (2A-E and 1 OA-D) based on the pattern of hybridization to chimpanzee and 
gorilla subtelomeric cap and subcap regions. (HSA) Homo sapiens, (PTR) Pan troglodytes, (GGO) Gorilla gorilla. 
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Figure 3. Schematic representation of the organization of subtelomeric regions. (A) Cap and subcap regions have been reported according to FISH 
experiments and copy number count in chimpanzee (fop lines) and gorilla (bottom lines). (8) Extracted chromosomes representing examples of all the 
different organizations observed in the subterminal regions of chimpanzee and gorilla. (C) FISH on stretched chromosomes showing the interdigitation 
between StSat and regions 2B and IOC in chimpanzee and gorilla, respectively. 



(lOA-D) in gorilla as part of the subterminal regions. We note 
that fosmid clones corresponding to region 2D did not shov\A 
any signal to either cap or subcap regions (Table 1; Fig. 3A,B). Due 
to the close proximity of the regions and the compact nature 
of subterminal heterochromatin, FISH experiments were generally 
uninformative in resolving the order of the probes. Thus, we de- 
fined the organization of these regions assuming that higher-copy 
probes mapped more distally than lower-copy-number probes 
based on studies of the organization of human subtelomeric re- 
gions (Mefford and Trask 2002; Linardopoulou et al. 2005; Rudd 
et al. 2009). This assumption was subsequently validated by BAC- 
based clone sequencing (see below). FISH experiments on meta- 
phase and interphase nuclei and stretched chromosomes using 
StSat and human cap-specific fosmid probes (2B and IOC) showed 
a clear interdigitation of satellite sequences and chromosome 2 
and 10 probes in subtelomeric cap-associated regions of chim- 
panzee and gorilla, respectively (Fig. 3C). 

There were three important observations. First, we found that 
probes mapping to the subterminal caps hybridized much more 
intensely than probes mapping to the subcap, irrespective of the 
species analyzed. We interpreted this as representing increases in 
copy number of these regions in the cap when compared to the 
subcap of African great ape chromosomes. Second, we observed 
a remarkable reciprocity in cap and subcap locations between go- 
rilla and chimpanzee. For example, probes ABC8_40925900_F12 
and ABC8_41134100_D22 (Fig. 2, region 2B) hybridized exclu- 
sively to the caps in chimpanzee but only to the subcaps in gorilla. 
Similarly probe ABC8_40868200_C16 (Fig. 2, region IOC) showed 
strong hybridization signals to the caps in gorilla but very weak 
signals to most chimpanzee subcaps (Table 1). Third, the subcap 
hyperexpansions were accompanied by fixed, large-scale deletions 
of the ancestral locus where the duplicated sequences originated. 
We identified a 75-kbp deleted segment in gorilla mapping to hu- 
man chromosome 2 (Fig. 2, region 2A) and a 76-kbp region in 
chimpanzee mapping to human chromosome 10 (chrlO: 19359773- 
19435714) (Table 1). Both deleted regions showed no traces of 
RefSeq genes but displayed predictions of genes in the area (Fig. 2). 

In order to understand the fate of the most terminal portions 
of the p-arm of chromosomes Hp and Ilq in chimpanzee and 
gorilla following the fusion, we performed additional three-color 
hybridization experiments using BAC probes corresponding to 



human chromosome 2p and 2q (RP11-301C20: chr2:35,539,435- 
35,655,555 and RP11-36108: chr2:165,546,718-165,724,972, re- 
spectively) along with fosmid probes corresponding to the region 
of interest. This approach was essential to discriminate the acro- 
centric and almost cytogenetically indistinguishable chromo- 
somes lip and Ilq in both species. We observed no signals to the 
chimpanzee subcap locations using fosmid clones from the chro- 
mosome 2 region (Fig. 2), while gorilla displayed subcap signals 
using both chromosomes 2 and 10 fosmid clones (Supplemental 
Fig. S2; Supplemental Table SI). It is noteworthy that region 2A 
showed signals on the orthologous lip and pericentromeric regions 
on chromosome IX in chimpanzee, while no signals were observed 
at the orthologous gorilla chromosome lip. Gorilla showed only 
signals on chromosome IX and likewise for macaque, which was 
used as representative of Old World monkeys (data not shown). 
These results date the duplication from chromosome IX to chro- 
mosome II after the divergence of gorilla from the chimpanzee- 
human ancestor. 

Chimpanzee interstitial heterochromatin 

We performed additional analyses to localize the interstitial het- 
erochromatic regions in the chimpanzee genome. We selected a 
set of overlapping human fosmid clones corresponding to the 
previously reported region of insertion (Gross et al. 2006) and 
used FISH to more precisely localize the integration sites of the 
heterochromatic chimpanzee blocks on chromosomes VII and 
XIII. We localized the insertion breakpoint of the chromosome 
VII heterochromatic block to a 12.3-kbp interval (as defined by 
probes ABC8_40890400_C6 at chr7: 101930395-101963680 and 
ABC8_720140_F22 at chr7: 101951324-101986595) and the chro- 
mosome XIII breakpoint to a 5.9-kbp interval (as defined by 
probes ABC8_2114240_I22 at chrl3: 45906763-45941160 and 
ABC8_43215900_F22 at chrl3: 45935232-45970371) (Supplemen- 
tal Table S2; Supplemental Fig. S3). While the organization of these 
interstitial heterochromatic blocks was virtually identical in com- 
position to chimpanzee subterminal caps, none of these chromo- 
some 7 or 13 probes produced any signals in the caps or subcaps 
of the chimpanzee chromosomes, suggesting an independent and 
unidirectional transfer of sequence from the subterminal cap region 
to euchromatin. Notably, the integration sites correspond to hu- 
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man-chimpanzee and human-chimpanzee-gorilla segmentally du- 
plicated regions in chromosome VII and chromosome XIII, respec- 
tively. Further, in the case of chromosome VII, the insertion point 
maps within 1 Mbp from an inversion breakpoint specific to the 
ancestral lineage of human and chimpanzee (Ventura et al. 2011). 



Copy number variation between and within species 

We performed two sets of experiments to assess the extent of var- 
iation between and within species. To assess qualitative differences 
in the chromosomal distribution, we first performed cohybridiza- 
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Figure 4. Summary of FISH experiments in human, two common chimpanzees, one pigmy chimpanzee, and two gorillas. Colored circles represent 
locations of the probes in great apes chromosomes; half circles represent heterozygous signals. Difference in interstitial heterochromatin on chromosome 
VII is displayed between common and pygmy chimpanzees. (PPA) Pan paniscus. 
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tion experiments on metaphase chromosomal spreads from two 
gorillas, two chimpanzees, and one bonobo. Marked differences 
in the distribution and signal intensity were observed, especially 
when comparing homologous chromosomes in gorilla (Figs. 3B, 4), 
where subterminal regions showed differences in the organization 
of the cap (see chromosomes XIX and XXII). The two common 
chimpanzees and the bonobo showed a much more uniform distri- 
bution of probes, with the exception of the interstitial chromosome 
VII heterochromatic block, which appears less complex in bonobo 
when compared to the two common chimpanzees (Figs. 3B, 4). 

To obtain a quantitative assessment of copy number varia- 
tion, we performed read-depth-based copy number predictions 
as previously described in Sudmant et al. (2010). Briefly, next- 
generation sequence reads were mapped to reference sequences 
and read-depth was used to estimate copy number after correcting 
for GC-bias. While the efficacy of this approach has been validated 
for low- to middle-copy duplications, the dynamic range response 
of high-copy repeats had not been previously tested. To assess this 
more formally, we initially estimated the median copy number for 
various subterminal cap regions in three species (human, chim- 
panzee, and gorilla) and compared these values to the number and 
relative fluorescence intensity normalized by the value of maxi- 
mum fluorescence (VMF = 255). The relative fluorescence intensity 
and read-depth copy number for each locus in each species were 
highly correlated (r^ = 0.93-0.99) (Supplemental Table S3; Fig. 5; 
Supplemental Fig. S4), suggesting that copy number, even for high- 
copy repeats, may be accurately estimated. 

We estimate the median species copy number for various 
subcap regions ranges from 29 to 64 in chimpanzee with greater 
diversity and lower copy number (12-55) than observed in gorilla 
(Supplemental Table S3). We estimate the median copy of the 



hyperexpanded segments within the subterminal cap to be 798 
in chimpanzee compared to 1092 in gorilla. In contrast, all ana- 
lyzed human genomes, as well as the Neanderthal, were either 
predicted to be diploid for the corresponding segments or showed 
only modest evidence of duplication (<14 copies). Adjusting for 
the size of the probes and differences in copy number for each 
segment, the amount of segmental duplication added to the cap 
and subcap regions is strikingly similar (—46 Mbp) in both gorilla 
and chimpanzee (Supplemental Table S3). This value exceeds the 
initial estimates of StSat sequence by more than lO-fold. 

Sequence analyses 

Since the subcap and cap regions of gorilla and chimpanzee chro- 
mosomes are missing from the corresponding primate genome 
assemblies, we identified large-insert clones by hybridization and 
sequenced the inserts using capillary-based sequencing methods. 
We also used BLAST to search for sequence similarity with StSat 
(contained in the pCht7 and pChtl3 clones) (Royle et al. 1994) for 
GenBank accessions corresponding to BAG clones carrying these 
sequences within the nucleotide (nr/nt) and high-throughput ge- 
nomic sequence (HTGS) collections. We identified 17 chimpanzee 
and nine gorilla clones whose inserts had been sequenced and 
had tracts of StSat sequence (Supplemental Table S4). The cor- 
responding sequences were annotated as StSat, the presence of 
segmental duplications in human, chimpanzee, and gorilla, and 
homology with the human reference genome using a suite of 
computational programs (Methods). 

Our analysis showed that all 26 BAG sequences were highly 
duplicated in chimpanzee and gorilla genomes and carried both 
species-specific as well as ancestrally shared duplicated sequences 
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Figure 5. Distribution copy number counts versus fluorescence. Copy number estimates of sequence read-depth for each genomic region correlate 
strongly (r^ = 0.99) with measured fluorescence intensity for every probe distributed on chromosome 2 and chromosome 1 0 (see Supplemental Table S3). 
Red arrows show the location of the single probe positive for cap, while blue arrows display the group of probes positive for subcap in (A) chimpanzee and 
(B) gorilla (left panels). The correlation rates reduce when hyperexpanded segments are excluded from the analysis (right panels). 
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(Fig. 6A,B). None of the BACs, however, was composed solely of 
satellite sequences, but rather they showed a patchwork of seg- 
mental duplications mixed with tracts of StSat sequences. While 
variable in size, most of the satellite tracts were >20 kbp, with 
some exceeding 60 kbp within a given BAG (Fig. 6A,B; Supple- 
mental Fig. S5A,B). It should be noted, in this regard, that nine 
out of the 26 BACs failed to completely assemble or assembled 
into sequence contigs substantially shorter than the typical BAG 
insert length for these libraries, indicating difficulty in cloning and 
sequence assembly of these regions. 

Four chimpanzee BAG clones showed a relatively unique 
architecture — segmental duplications from human chromosome 10 



as well as several locations on chromosome 7, including chr7: 
101708105-102032998. The latter region was previously defined as 
flanking the interstitial heterochromatic block specific to the 
chimpanzee genome. Gopy number sequence read-depth analysis 
predicts a dramatic increase in copy number transitioning from the 
chromosome VII to chromosome X regions consistent with the 
transition point from euchromatin to the interstitial heterochro- 
matin in the chimpanzee genome (Supplemental Fig. S6). 

Although we identified between 20 and 30 distinct segmental 
duplications associated with StSats, several were observed much 
more frequently. Eight out of 17 chimpanzee BAG clones carried 
segmental duplications corresponding to the human chromosome 
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Figure 6. Genomic organization based on large-insert clone sequencing. Sequence organization shown for large-insert BAG clones that carry StSat 
sequences in (A) chimpanzee and (B) gorilla. Sequences were annotated for the presence of duplications (whole-genome shotgun sequence detection 
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1044 Genome Research 



www.genome.org 



Gorilla/chimpanzee subterminal sequence evolution 



2 region 2B, while four of those eight showed similarity to the 
chromosome 10 region lOB (Fig. 2). Similarly, seven out of nine 
gorilla BAG clones showed similarity to region IOC, flanked by StSat 
consistent with previous FISH and copy number estimates. Such 
shared paralogous sequences are particularly valuable because they 
provide an opportunity to assess the phylogenetic relationship 
among various copies between and within species. 

We specifically searched for shared sequences located within 
10 kbp of StSat sequences in an effort to date the expansion of StSat 
sequences during hominid evolution. We identified four regions: 
two mapping to human chromosome 2 (39.6 and 10.0 kbp in 
length) and two mapping to chromosome 10 (11.8 and 8.2 kbp in 
length) where we had successfully resolved seven to nine distinct 
copies among the chimpanzee and gorilla clone sequences. Homol- 
ogous sequences from these BACs and the human and orangutan 
reference genomes were extracted, aligned (ClustalW), and phyloge- 
netic trees were constructed (MEGA4 maximum likelihood). We 
used orangutan as an outgroup because these loci were single-copy 
in this species and mapped, in most cases, to the ancestral locus. 

The phylogenetic analyses generally support independent 
expansions of the duplicated sequences in chimpanzee and gorilla. 
Using an estimated orangutan divergence of 14 million years ago 
(mya) from the human lineage, we can approximate these ex- 
pansions to have initiated between 5 and 7 mya. The topology 
of the trees suggests that the spread of the subterminal segmental 
duplications was continuous within each lineage (as indicated by 
the varying branch length and terminal subclades) and accumu- 
lated over a long period of evolutionary time. Interestingly, only 
one of the paralogous segments (a chromosome 10 segment) was 
shared at the sequence level between both gorilla and chimpanzee 
subterminal clones. Phylogenetic analysis (Fig. 7; Supplemental 
Table S5) provides compelling evidence that this segment was 



duplicated in the common ancestor of chimpanzee and gorilla 
(7-8 mya) but then subsequently and gradually expanded to the 
subterminal caps of both chimpanzee and gorilla. These duplica- 
tions are not present in contemporary humans, potentially due to 
subsequent loss in the ancestral lineage. 

Discussion 

In this study, we characterize the genomic organization and evolu- 
tion of the chromosomal caps of gorilla and chimpanzee chromo- 
somes. We operationally distinguish two genomic architectures — 
cap regions, consisting of hyperexpanded arrays of human chro- 
mosome 2 and 10 segmental duplications (chimpanzee and gorilla, 
respectively) interspersed with StSat (Fig. 2), and subcap regions 
composed of lower-copy segmental duplications admixed with 
occasional tracts of StSat. This subterminal heterochromatin and 
the corresponding transition regions are absent from the genomes 
of both human and orangutan. Phylogenetic and cytogenetic data 
suggest that the genomic architecture of these regions began to 
form in the common ancestor of human, chimpanzee, and gorilla 
— 7-8 mya (Fig. 7A). In contrast, we estimate that the expansion 
and spread of cap sequences to multiple chromosomes occurred 
soon after speciation in gorilla and chimpanzee (4-5 mya), albeit 
independently, with a reciprocal definition of cap and subcap se- 
quences in each lineage. It is unclear why the progenitor cap and 
subcap sequences were quiescent for 3 million years, but the emer- 
gence of the StSat sequence must have been important in generating 
this instability since it, along with the chromosome 2 and 10 seg- 
mental duplications, are the only common sequence elements. We 
hypothesize that chromosome II rearrangements played a central 
role in triggering this instability and specifying differences in the 
evolutionary trajectory in human, chimpanzee, and gorilla. 
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Figure 7. Phylogenetic analysis. Sequences shared among distinct subterminal genomic loci in chimpanzee (blue) and gorilla (red) were aligned to 
human (NCBI build35) and orangutan (ponAbe2) orthologous sequences (see text for details). Neighbor-joining phylogenetic trees (bootstrap values at 
each node) were constructed from four different genomic regions (A-D). Only the chromosome 1 0 segment (A) is shared between gorilla and chimpanzee 
cap sequences. We estimate the evolutionary time (green dots = mya) of duplications and expansions assuming a human-orangutan divergence of 14 mya. 
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Three evolutionary large-scale chromosomal rearrangements 
involving chromosome II over the last 12 million years of hominid 
evolution have been previously described (Fig. 8), namely: (1) 
a pericentric inversion within acrocentric chromosome Ilq in the 
gorilla-chimpanzee ancestor after separation from orangutan (Yunis 
and Prakash 1982; Roberto et al. 2008; Ventura et al. 2011), (2) 
a pericentric inversion on chromosome lip in the human-chim- 
panzee ancestor after divergence from gorilla lineage (Wienberg 
et al. 1994; Roberto et al. 2008), and (3) a fusion of Hp and Ilq 
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specifically in the ancestral lineage of human to create chromo- 
some 2 (Yunis and Prakash 1982; Wienberg et al. 1994). 

To explain our results, we propose the following possible 
model (Fig. 8). A segment of chromosome X was duplicatively 
transposed to the short arm of chromosome Ilq in the common 
ancestor of human-chimpanzee-gorilla. The initial Ilq inversion 
placed centromeric satellite sequences in close proximity to this 
segment potentiating the emergence of StSat sequences. The sec- 
ond inversion of chromosome Hp occurred in the ancestor of 
human and chimpanzee after separation 
from the gorilla ancestor. This event in- 
ternalized the protective Satlll sequence 
(Fig. 8) and effectively rendered chromo- 
some lip "exposed" to the spread of du- 
plicated sequences. (Note: Sequence sim- 
ilarity searches detect vestigial Satlll 
sequence homology specifically at the 
centromeric breakpoint of this large-scale 
inversion in human chromosome 2p.) 
We hypothesize that the existence of a 
smaller local inversion in ancestral hu- 
man-chimpanzee Ilq altered the order 
of chromosome 2 and 10 segments, 
placing the chromosome 2 segment ad- 
jacent to the StSat and, as a result, sus- 
ceptible to hyperexpansion. A key step 
was the spread of the StSat and poten- 
tially duplicated sequences to the distal 
end of the short arm of chromosome Hp 
in the human and chimpanzee ancestor — 
where additional events occurred, such as 
the duplicative transposition of a chro- 
mosome IX segment (segment 2A) (Fig. 8; 
Table 1). In this model, both Hp and Ilq 
could serve as sources for the spread of 
these sequences to other telomeric re- 
gions and the formation of subterminal 
heterochromatin. In the human ances- 
tor, subterminal heterochromatin was 
circumvented by the fusion of chromo- 
some lip and Ilq leading to the deletion 
of this unstable sequence from the hu- 
man lineage. It is possible that the pres- 
ence of StSat sequence along with se- 
quence homology may have, in fact, 
played a role in triggering this event. 

This scenario would explain (1) the 
absence of subterminal heterochromatin 
in humans, (2) the differential distribu- 
tion of subterminal heterochromatin in 
chimpanzee and gorilla chromosomes, 
and (3) differences in the composition 
of the cap and subcap sequences — in 
particular with respect to the hyperex- 
pansion of chromosome 2 and 10 se- 
quences in gorilla and chimpanzee, re- 
spectively (Cheng et al. 2005; Marques- 
Bonet et al. 2009). In our model, both 
the evolution and spread of subterminal 
heterochromatin was inextricably linked 
to large-scale chromosomal rearrange- 
ments that catalyzed its formation. In 
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contrast, the fusion of chromosome Hp and Ilq squelched this 
process in humans by eliminating the source of the StSat sequence. 
Since the phylogenetic analyses of both human and chimpanzee 
suggest the spread of these segments occurred >4 mya, it follows 
that the fusion of chromosome lip and Ilq occurred early during 
evolution; otherwise, subterminal heterochromatin would have 
evolved within our lineage. 

We recognize, however, that this is only one of many possible 
explanations that might account for the observed differences 
among human, chimpanzee, and gorilla subterminal regions. It 
may be also possible that functional constraints, such as the ex- 
pression or suppression of critical genes in humans, may have 
contributed to these dramatic differences between humans and 
great apes. Human diseases such as fascioscapulohumeral mus- 
cular dystrophy (FSHD), for example, arise from chromatin de- 
regulation (Lemmers et al. 2010) near the telomere of human 
chromosome 4. From an evolutionary perspective, it is known that 
specific gene families have expanded immediately proximal to 
the telomeres of human chromosomes. Of particular relevance in 
this regard is the Wiskott-Aldrich Syndrome Protein and SCAR 
Homolog (WASH) gene family (Linardopoulou et al. 2007). This 
gene family has both expanded within the subtelomeric regions, 
with a copy located specifically at the site of the chromosome 2 
fusion. The function of the specific family members is not known, 
but as a class, they are thought to be important in the reorganization 
of the actin cytoskeleton in filopodia protrusions (Linardopoulou 
et al. 2007). 

Among hominids, chimpanzees are the only lineage wherein 
heterochromatic blocks developed interstitially (chromosomes VII 
and XIII). Our sequence and FISH analyses indicate that the 
composition of these regions is indistinguishable from subtermi- 
nal heterochromatin. Refined mapping of the breakpoints reveals 
that the interstitial heterochromatin formed within segmentally 
duplicated regions shared between human and chimpanzee for 
chromosome VII and between human, chimpanzee, and gorilla for 
chromosome XIII. Thus, it is quite likely that homology between 
these interstitial blocks of segmental duplication and the sub- 
telomeric region provided a gateway for the movement of StSat 
and chromosome 2 segments internally (IJdo et al. 1991; Hirai et al. 
2005). Once present, these sequences amplified, creating hetero- 
chromatic interstitial G-band positive regions in this species. The 
fact that the chromosome VII interstitial heterochromatin has not 
fixed in the chimpanzee population may indicate that these in- 
terstitial "colonizations" occurred relatively recently (Hirai et al. 
2005) or, alternatively, are being deleted. 

Hirai and colleagues (2005) described a retrotransposable 
compound repeat DNA organization (RCRO) associated with sub- 
terminal heterochromatin. The authors hypothesized that this 
sequence was important in inducing and prolonging bouquet 
formation during meiotic prophase specifically in chimpanzee 
when compared to human and other primate chromosomes. Our 
data reveal a much more complex organization of subterminal 
heterochromatin that involved segmental duplication as opposed 
to retrotransposition — although it is possible that the RCRO orig- 
inally described is part of this architecture. Our copy number es- 
timates and sequence analysis (Fig. 5) suggest that StSat and seg- 
mental duplication expansions have added —45-46 Mbp of highly 
identical (l%-2% divergent) sequences to the ends of many gorilla 
and chimpanzee chromosomes. 

It has been previously reported that the presence of large 
blocks of heterochromatin at the ends of chromosomes may alter 
patterns of meiotic recombination (Miklos andNankivell 1976). In 



chimpanzee, it is, thus, possible that this architecture directly in- 
terferes with synapsis during meiosis, resulting in an overall re- 
duction of the total number of chiasma and a prolonged associa- 
tion of telomeric regions during zygotene and pachytene. Indeed, 
preliminary findings by Hirai and colleagues reported a mean 
lower chiasma frequency in chimpanzee when compared to hu- 
man (Hirai et al. 2005). While it is now known that the pattern of 
fine-scale recombination differs significantly between human and 
chimpanzee (Ptak et al. 2005), it will be interesting to determine 
whether this difference is most pronounced in close proximity to 
regions of subterminal heterochromatin. This will require both 
higher quality sequence and fine-scale mapping of meiotic cross- 
over events in these complex regions of human and great ape 
chromosomes. 

Methods 

Fluorescent in situ hybridization 

Gorilla and chimpanzee BAG and human fosmid clones were used 
as probes in FISH assays on human and great ape metaphase 
spreads. Metaphases from nonhuman primates (common chim- 
panzee, gorilla, bonobo, and rhesus monkey) were obtained from 
lymphoblastoid or fibroblast cell lines; human metaphase spreads 
were obtained from PHA-stimulated peripheral lymphocytes of 
normal donors by standard procedures. DNA extraction from BACs 
and fosmids was performed as already reported (Ventura et al. 
2001). FISH experiments were essentially performed as previously 
described (Ventura et al. 2004). Briefly, DNA probes were directly 
labeled with Gy3-dUTP, Gy5-dUTP (GE Healthcare), or Fluorescein- 
dUTP (Invitrogen) by nick translation. Two hundred ng of labeled 
probe were hybridized on metaphases spreads; hybridization was 
performed overnight at 37°C in 2x SSC, 50% (v/v) formamide, 
10% (w/v) dextran sulphate, 5 jxg COTl DNA (Roche), and 3 jxg 
sonicated salmon sperm DNA, in a volume of 10 jjlL. Post-hybrid- 
ization washing was performed at 60°C in 0.1 X SSC (three times, 
high stringency). Washes of interspecies hybridization experi- 
ments were performed at lower stringency: 37°C in 2x SSC, 50% 
formamide (X 3), followed by washes at 42°C in 2x SSC (X 3). 
Chromosome identification was obtained by DAPI staining, pro- 
ducing a Q-banding pattern. Digital images were obtained using 
a Leica DMRXA epifluorescence microscope equipped with a cooled 
CCD camera (Princeton Instruments). Cy3, Cy5, Fluorescein, and 
DAPI fluorescence signals, detected with specific filters, were re- 
corded separately as grayscale images. Pseudocoloring and merging 
of images were performed using Adobe Photoshop software. 

Stretched chromosomes 

Chimpanzee and gorilla chromosomes were stretched mechan- 
ically. Colcemid-treated lymphoblastoid cells were washed in 
phosphate-buffered saline (PBS), counted, and resuspended in 
a hypotonic solution (75 mM KCl, Cfi 60,000 cells/mL) for 15 min. 
Then, 0.5 mL of the suspension was cytocentrifuged (Shandon 
Cytospin 3 centrifuge) onto coated glass slides (Thermo Shandon 
Double Cytoslide) at 800 rpm for 4 min and fixed in methanol at 
— 20°C for 15 min and in methanohacetic acid 3:1 for 30 min. The 
slides were aged at 80°C for 1 h before hybridization. 

Oiigo-FISH and PNA probes 

Biotinylated primers were designed on the StSat sequence 
(5TCCATGTTTATACAGATAGCGGTG3'-Bio) and directly used as 
probes (0.001 jxmol) in FISH experiments on chimpanzee and 
gorilla metaphase spreads. Probe and target were denaturated at 
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74°C for 6 min and hybridized o/n at 37°C in 2x SSC, 50% (v/v) 
formamide, 10% (w/v) dextran sulphate, and 3 fxg sonicated 
salmon sperm DNA, in a volume of 10 jjlL. Slides were quickly 
washed in 2x SSC, 50% formamide at room temperature, and 
signals were detected (detection buffer: 1% BSA/lx SSCO/0.1 
Tween 20) with avidin-FlTC at 37°C for 1 h. Excess fluorochrome 
was washed away in 4X SSC/0.1 Tween 20 at room temperature, and 
chromosomes were DAPl stained. 

PNA probes hybridized to denatured telomeric sequences in 
cells permeabilized in hot formamide. Two fxL of ready-to-use Cy3- 
conjugated telomere PNA probe (Dako, Telomere PNA FISH Kits) 
were used as probes in fluorescent in situ cohybridization assays 
to evaluate the respective localization of telomeric sequences and 
the StSat on gorilla and chimpanzee chromosomes. 

Satellite III and rDNA chromosomal distribution 

To determine the chromosomal distribution, primer sets were 
designed from the known sequences of each Satlll subfamily 
(Bandyopadhyay et al. 2001a,b) (pE-1: GATTCGATTCCATTGCAC 
TCG-GGACTGAAACAAAATGGAGACC; pE-2: ATGCAGCCTGGG 
TGACCT-AAGAATCCATACCACACC; pR-1: TGTGCCTCTGTGTTA 
CAT-ACTGCCATCCTTTCCACC; pR-2: ACGCTGGGTGATGGAGT 
GAAATAC-ACTCCATTTCATTCCGCCGC; pR-4: TAAGCGTGGAAT 
GGGTTTGAGC-CATCCGATTCCATTTCACTAC; pK-1: ATCGAATG 
GATTCCTAATTG-CGATATCTTCTGTTACACG; pW-1: AATGGGAT 
GGAACCGAGTGG-CCTTTCATTTCAAGTCCCTTCGC) and used 
to amplify genomic DNA by polymerase chain reactions (PGR) on 
human, chimpanzee, and gorilla genomic DNA. Amplified prod- 
ucts were then directly labeled with Fluorescein-dUTP by PGR la- 
beling and used as probes in FISH assays. The use of PGR labeling 
avoids the possible contamination from genomic DNA by nick 
translation labeling of PGR products. PGR labeling was carried out 
in a final volume of 20 |jlL that contained 100 ng PGR product, 2 |jlL 
10 X reaction buffer (Invitrogen), 2 |jlL 50 mM MgGl2, 0.5 |jlL each 
primer (10 |jlM), 0.5 2 mM |jlL dAGG, 2.5 |jlL 1 mM Fluorescein- 
dUTP, 5 |jlL 2% BSA, and 0.3 ixLTaq polymerase (5 U/|jlL). For both 
amplification reactions, the cycling parameters used were as fol- 
lows: 3 min initial denaturation at 94°G, followed by 30 cycles of: 
94°C for 30 sec, 56°C for 30 sec, and 72°G for 30 sec. Final extension 
was at 72°G for 10 min. 

In order to assess rDNA localization in great apes, clones 
containing rDNA were extracted by a BLAST sequence similarity 
search of the human rDNA complete repeating unit (accession: 
U13369.1) against the HTGS database. Several human BAG clones 
were selected and tested; the clone giving the most consistent and 
reliable hybridization signals was chosen as a probe for the cohy- 
bridization experiments: GH507-159O11. 

BAG sequencing 

Ghimpanzee and gorilla clones were selected for complete insert 
sequencing using capillary sequencing methods (McPherson et al. 
2001) in order to obtain high-quality finished sequence within 
duplicated regions. Rearrangements were visualized using Miropeats 
(Parsons 1995) and previously described in-house visualization tools 
(Kidd et al. 2010). 

Data access 

A subset of clones was selected for complete insert sequencing. The 
sequences have been deposited in GenBank (http://www.ncbi.nlm. 
nih.gov/genbank) under accession nos. AG198877.1, AG183608.3, 
AG192975.3, AG097005.1, AG194646.4, AG200072.3, AG191679.2, 
AG183292.1, AG150905.2, AG152419.2, AG204739.3, AG192628.1, 
AC196285.4, AC213004.6, AG184708.3, AC192825.3, AG145493.3, 



AG239281.3, AG241375.2, AG239628.1, AG239393.3, AG239711.2, 
AC241376.2, AG241501.1, AC239640.2, AC239931.2. 
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